N94-11534 

ANALYZING HUMAN ERRORS IN FLIGHT MISSION 

OPERATIONS 


Kristin J. Bruno 
Linda L. Welz 

Jet Propulsion Laboratory, 
California Institute of Technology 
4800 Oak Grove Drive 
M/S 125-233 

Pasadena, California 91101 
(818) 354-7891 

email: kbruno@spal.jpl.nasa.gov 


G. Michael Barnes 

Computer Science Department (COMS) 
California State University 
Northridge, California 91324 
(818) 885-2299 

email: renzo@ms.secs.csun.edu 


Josef Sherif 

Jet Propulsion Laboratory, 
California Institute of Technology 
4800 Oak Grove Drive 
M/S 125-233 

Pasadena, California 91101 
(818) 354-8365 


Abstract 

A long-term program is in progress at JPL to reduce cost and risk of flight mission operations through a defect 
prevention/error management program. The main thrust of this program is to create an environment in which the 
performance of the total system, both the human operator and the computer system, is optimized. To this end, 1580 
Incident Surprise Anomaly reports (ISAs) from 1977-1991 were analyzed from the Voyager and Magellan projects. 
A Pareto analysis revealed that 38% of the errors were classified as human errors. A preliminary cluster analysis 
based on the Magellan human errors (204 ISAs) is presented here. The resulting clusters described the underlying 
relationships among the ISAs. Initial models of human error in flight mission operations are presented. Next, the 
Voyager ISAs will be scored and included in the analysis. Eventually, these relationships will be used to derive a 
theoretically motivated and empirically validated model of human error in flight mission operations. Ultimately, 
this analysis will be used to make continuous process improvements to end-user applications and training 
requirements. This Total Quality Management approach will enable the management and prevention of errors in the 


future. 

Introduction 

A long-term program is in progress at JPL to reduce 
cost and risk of flight mission operations through a 
defect prevention/error management program. Flight 
mission operations require systems that place human 
operators in a demanding, high risk environment This 
applies not only to the mission controllers in the "dark 
room", but also to the mission planners and flight 
teams developing sequences, to the Deep Space 
Network (DSN) operators configuring and monitoring 
the DSN, and to the engineering teams who must 
analyze spacecraft performance. This environment 
generally requires operators to make rapid, critical 
decisions and solve problems based on limited 
information, while following standard procedures 
closely. The mission operations environment is, 
therefore, inherently risky because each decision that a 
human operator makes is potentially mission critical, 
and in a high-demand environment, human errors occur 
frequently. Given the high risk in such an 
environment, these human errors can have grave 
financial (e.g., the Soviet loss of PHOBOS) or loss-of- 
life (in manned space flight) consequences. 

To contain this risk at JPL, flight mission operations 
procedures include intensive human reviews. In 
addition, when an error does occur, rapid rework is 


required to ensure mission success. This strategy has 
worked well to reduce risk and has ensured the success 
of JPL missions. However, the large human labor 
investment in these reviews and rework has contributed 
substantially to the cost of flight mission operations. 
Prevention of such errors would reduce both cost and 
risk of flight projects. The motivation of this program 
is that risk can be contained more cost effectively by 
preventing human errors rather than reworking them. 
The goal of this program is the management, reduction 
and prevention of errors. The key facet of this program 
is to create an environment in which the performance 
of both the human operator and the computer system is 
optimized. Systems must be designed to enhance 
normal human performance (e.g., as described in Card, 
Moran, & Newell, 1983); training programs must be 
designed to alleviate likely errors; and functions that are 
human-error prone should be automated. Thus, to 
design and implement a successful defect/error 
prevention program requires a theoretically motivated 
model of human problem solving and decision making 
based on current theories of knowledge representation, 
the structure of memory, schemas, and mental models 
(e.g., Anderson & Bower, 1973; Norman, 1988). 
further, such a model must be data-validated to ensure 
its ultimate applicability to the flight mission 
operations environment. Principles of cognitive 
psychology, human-computer interaction, and Total 
Quality Management (TQM) are used to analyze past 
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errors and make changes to end-user applications and 
training requirements and task policies and procedures 
to prevent or manage these errors in the future. 

Method and Results 

The process developed for this program can be viewed 
as a continuos process improvement loop consisting of 
five steps: 

1. Institute the Mission Operations and Command 

Assurance (MO&CA) function on JPL flight 
projects. 

2. Analyze Incident Surprise Anomaly (ISA) data for 

causes of errors and patterns of causes. 

3. Develop a prototype of a human process model of 

the underlying factors causing cognitive errors 
during flight operations based on the ISA data. 

4. Develop a defect prevention/error management 
methodology based on the flight operations human 
process model. 

5. Insert the methodology into Flight Mission 
Operations system development and training via 
system requirements and training prototypes and 
into policies and procedures via MO&CA. 

Thus far in the program Step 1 has been successfully 
completed. MO&CA teams have been installed on 
flight projects to help reduce cost and risk. The main 
benefits of these teams are realized from collecting and 
analyzing error data in the form of ISA Reports. Based 
on these reports MO&CA teams make 


recommendations for subsequent changes to flight 
operations procedures, and work with the flight 
mission operations teams to incorporate the 
recommendations. The work of these teams is ongoing 
on several projects. The current work, reported in this 
paper, consists of extended analysis of error data (ISAs) 
to determine patterns of causes and develop a prototype 
human process model (Steps 2 and 3). Currently, 
error data with cause codes is available for three flight 
projects over a 14 year period. 

The goal of Step 2 was to reduce the data to a 
meaningful subset of the most frequent causes of errors 
based on the TQM principle of investigating the most 
prevalent problems first in a defect prevention/error 
management program. ISA reports from two projects. 
Voyager and Magellan, were classified in one of 12 
cause code categories. Each project used a slightly 
different taxonomy of detailed cause categories within 
the high-level cause. Thus, the detailed analysis 
entailed developing a composite cause category 
taxonomy of data for both projects making the detailed 
cause category analysis equivalent. The categories used 
were developed by MO&CA teams based on major 
functions in the flight mission operations 
environment An early Pareto analysis was performed 
to determine the most frequent high-level causes of 
errors. The analysis showed that, of the 1580 ISA 
reports recorded, the three cause categories of Human 
(38%), Software (20%), and Documentation (10%) 
accounted for 68% of the errors (Figure 1). 
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S/C - Spacecraft 
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ProJ - Project Policy 

Approl - Approval Process 
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ISA Cause Category - Voyager (1977-1989) & Magellan (1989-1991) 


Figure 1 

Voyager and Magellan ISAs - By Cause Code 
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Based on the high number of human errors, subsequent 
analysis of more specific cause codes was restricted to 
ISA reports for the Magellan project that were classified 
as Human Error. The taxonomy of cause codes was 
then used to score the 204 Magellan Human ISAs. 
Each ISA was read by a team of 2 investigators who 
assigned as many cause codes as was appropriate. In 
addition, each cause code was assigned a value of 0, 1, 
or 2. These values were assigned as follows: "2" was 
assigned if this cause alone caused the anomaly to occur 
and the ISA to be written; a "1" was assigned if this 
cause was an ancillary cause which contributed to the 
anomaly, but would not by itself have caused it; a ”0" 
was assigned if this cause did not apply to the ISA. 
Thus, for the 204 Magellan Human Error ISAs 
examined, 269 cause codes were assigned. 

Next, the Magellan human error data was subjected to a 
cluster analysis to identify clusters of cause code 
patterns. Interpretation of these clusters was expected 
to reveal the underlying factors causing cognitive 
errors. BMDP's cluster analysis, a multidimensional 
scaling technique, was used. The program groups the 
pair of cases (in this case ISAs), with the shortest 
Euclidean distance (the square root of the sum of 


squares of the difference between the values of the 
variables for two cases). In a step-wise manner, two 
cases or clusters are grouped such that initially each 
case is an individual cluster and at the end all cases are 
in one cluster. In the present analysis, 25 clusters were 
formed first at distance 0. Thus the internal distance 
among ISAs in each of those 25 clusters was 0; that is, 
the ISAs were scored identically. Figure 2 shows a 
Pareto chart of the size of the first 25 clusters. The 4 
largest clusters contained 25, 21, 19, and 15 ISAs 
respectively, followed by a gap. The next cluster, of 
size 9, was the cluster of ISAs of unknown cause. 
Thus, only the 4 largest clusters were selected for 
interpretation. These 4 clusters consisted of ISAs with 
only one cause rated “ 2 ”. They were Oversight (12%), 
Lack of Communication (10%), Edit Error in Product 
(9%), and Omission of Action (7%), respectively, and 
accounted for 39% of the 204 ISAs (Figure 3). At the 
next major level of clustering, distance 3.3, these 4 
clusters joined, along with others, to account for 52% 
of the total ISAs. Finally, at the third major level of 
clustering, distance 6.6 , 95% of the ISAs joined. The 
final 5% of the ISAs did not join until distance 14.8 
and these errors were rare, dissimilar cases. 
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Figure 2 

Magellan Human Error ISAs - By Cause Code 
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Figure 3 

Magellan Human Error ISA Ouster Analysis 


In order to model the underlying causes of the ISAs model the common characteristics within each cluster, 

within the four initial clusters, each ISA within a The four highest frequency clusters that joined at 

cluster was reexamined for its specific characteristics. distance 3.3 exhibited at least two common cognitive 
Characteristics common to all ISAs in the cluster were elements, omission of action and oversight. In 

compared to known cognitive phenomena, particularly omission of action, a goal was acquired, but a subgoal 

with human error theory. Several taxonomies of was not executed for some reason, typically cognitive 

human error have been proposed (e.g., Norman, 1981, capture or a distraction. Cognitive capture generally 

1983; Reason, 1990). However, there is no general refers to a psychological phenomenon in which a well- 

agreement on a single taxonomy. Thus, it has been rehearsed action takes control of a less familiar action, 

suggested that a taxonomy must be tailored to a given This is particularly true when attention is drawn 

environment (Senders & Moray, 1991). The taxonomy elsewhere. For example, one ISA (8508) documented a 

adopted here, in Appendix A, is tailored and simplified case in which DSN station operators did not notice for 

from Reason (1990). Figure 4 shows the cognitive three days a special condition in the Sequence of Events 

mechanisms in the taxonomy. Tasks are divided into a (SOE) during Magellan support The problem was 

planning and an execution component The error types caused by the fact that Magellan support had become 

are Oversight (generally known as a slip), Omission of routine and the SOE rarely changed. Thus, this routine 

Action (generally known as a lapse), a mistake, and a support "captured" the processing of the changed SOE 

violation. This general taxonomy was then used to so that some new steps were omitted. This error 
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Error Type 

Task Type 


Planning 

Execution 

Oversight 
(slip) ; 

OK 

Followed plan, but performed 
a wrong action 

Omission of 
Action (lapse) 

OK 

Followed plan, but skipped 
an action 

Mistake 

Faulty Plan 

Followed faulty plan 

Violation 

OK 

Intentionally deviated from plan 


Figure 4 
Human Error 


resulted in a loss of data. The second common 
cognitive element in this cluster was oversight, in 
which the status of the task is not evident at any 
given point in time. Thus, an incorrect action (i.e., 
inappropriate at this point in the task) may be 
performed, or an incorrect object may be used. For 
example, another ISA (1973) documented an error in 
which a file was created using an old version of the 
required software. The problem was traced to the fact 
that, as new versions of the File generation software 
became available, they were simply installed on the 
appropriate machines. As time passed, multiple 
versions of the software were available. To the 
operator generating the file, it was not clear which 
was the correct version of the software. Thus, the 
task status was not evident, and a wrong object (the 
old software) was used. The lack of distinction 
between the current software and previous releases 
generated a description error. There were no salient 
attributes to facilitate the use of the correct software 
release. This error resulted in loss of time, since the 
problem had to be researched and the file regenerated, 
thus increasing operations costs. In addition, risk 
increased since an incorrect file was generated. 

As these common cognitive elements were uncovered, 
it became clear that the single common element 
underlying this cluster was that all the ISAs were 
execution errors. Thus, this cluster, at distance 3.3, 
was labeled "Execution Errors"(Figure 3). Finally, at 
the third major level of clustering, at distance 6.6, 
planning errors joined the execution errors, thus 


suggesting a label of "Execution and Planning 
Errors." 

Preliminary Conclusions 

Although this defect prevention and error management 
program is in its infancy, some preliminary 
conclusions can be drawn from the initial analysis. 

1. Flight mission operations human error 
data is amenable to interpretation via 
human error theory. JPL currently has a large 
volume of ISA data. While this data may be locally 
analyzed within a project during operations, 
particularly during a major anomaly, the analysis is 
typically ad hoc and localized to that one project. 
This preliminary work demonstrates that by modeling 
error data, underlying causes can be investigated in a 
systematic way, and classes of errors in this 
environment (such as execution errors) can be 
uncovered. In addition, this general information can 
then be shared across projects. 

2. In JPL flight mission operations, a 
significant portion of human ISAs are 
errors in executing a task. The results of this 
study showed that 52% of the 204 Magellan human 
errors analyzed were execution errors. This provides a 
focus for possible solutions on execution problems. 
It is also speculated that execution errors will be 
found to be preventable or manageable. 
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3. System requirements, policies and 
procedures can be written to prevent known 
cognitive errors. As was previously mentioned, 
through this systematic analysis, classes of errors 
will be uncovered. In this way, solutions to manage 
errors that do occur, or solutions to prevent them 
from occurring can be generated. For example, to 
prevent errors like the one documented in ISA 8508, 
special conditions in a file can be highlighted to avoid 
capture in routine tasks. To avoid errors such as ISA 
1973, proper configuration management policies and 
procedures should be written and enforced in 
operations. In this case, archiving old versions of the 
file-generation software off-line would eliminate 
operator confusion about which software to use in 
generating files and thus prevent this oversight or 
slip. 

In summary, a method for analyzing human errors in 
flight mission operations has been presented. 
Although in a preliminary phase, it is clear that such 
a method in which error data is subjected to a cluster 
analysis, the resulting clusters are examined for 
common cognitive elements, and these elements are 
modeled using cognitive psychological theory, can 
lead to an understanding of the causes of errors and 
typical classes of errors. Using TQM principles, 
these Findings can then be used at the beginning of a 
project’s life cycle to improve system requirements, 
project policies and procedures, and operator training 
to manage errors that do occur, or prevent them from 
occurring at all. It is only through such a systematic 
analysis method that cost and risk can be reduced in 
flight mission operations. Finally, it is clear that 
this analysis has wide applicability to other errors. It 
is currently planned that this program will eventually 
expand to include analysis of other errors in Figure 1 
such as software and documentation, and to other 
environments such as the DSN and system 
development 
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APPENDIX A 


Taxonomy of Human Error Cause Codes 


HUMANS 


HUM | 



HUM1 

Inadequate Knowledge/ 
Inexperience 

Error made due to inexperience or lack of knowledge if person is 
experienced. 

-1 

Procedures 


-2 



-3 

S/C characteristics 


-4 

Command procedures 


-5 

Ground Operations 


-6 

S/C Status 


-7 

Constraints 


-8 

Schedule change 


-9 

Anticipated command effect 


-10 

i>f av j a tra iWMMBHI 


-11 

SAV Ground 


HUM2 


Error made due to an intentional deviation from plan 

-1 

Procedures 


-2 

Policies 


-3 

Guidelines 


-4 

Flight / Mission Rules 


-5 



-6 

Ground Operations - 
Compatibility 


-7 

Operational Requirement 


HUM4 

Error 

Error made due to an unintentional deviation from plan 

-1 

Wrong Plan - Mistake 

Plan is wrong, but was executed correctly 

-2 

Plan OK - Error Unknown 

Plan is correct, error is unknown 

-3 

Plan OK - Omission of Action 

Plan is correct, but an action was omitted during execution 

-4 

Plan OK - Oversight 

Plan is correct, but an action during execution was wrong. 

HUM5 

Product Interface 

Error made while producing a product 

-1 

Error in Copying 


-2 

Error in calculation 


-3 

Data entry error 

Error in original data entry 

-4 

Edit Error 

Error in editing an existing product 

HUM6 

Communication 


-1 

Lack of communication 


-2 

Miscommunication 
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